In this notebook we will be analysing the Crime Incident Reports of the city of Boston, US, in the years 2016-2018. The actual dataset contains records for events from June 2015 to January, 2019. This report is provided by the Boston Police Department (BPD) "to document the initial details surrounding an incident to which BPD officers respond" Source.
Since there are more than 355k events registered in this dataset, we will only be focusing at the incidents related to motor vehicles.
Police responses are logged and reported chronologically by the Boston Police Department.
We will be focusing on the "OFFENSE_CODE_GROUP" column/feature, which is the "internal categorization" of the incident.
The "OCCURED_ON_DATE" will be one of our most valuable assets. We will look at days, months and years. The "INCIDENT_NUMBER" will help us with counting events. "SHOOTING" indicates if a shooting took place.
df.head()
| INCIDENT_NUMBER | OFFENSE_CODE | OFFENSE_CODE_GROUP | OFFENSE_DESCRIPTION | DISTRICT | REPORTING_AREA | SHOOTING | OCCURRED_ON_DATE | YEAR | MONTH | DAY_OF_WEEK | HOUR | UCR_PART | STREET | Lat | Long | Location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | I192004100 | 2907 | Violations | VAL - OPERATING AFTER REV/SUSP. | NaN | NaN | 2019-01-15 20:39:00 | 2019 | 1 | Tuesday | 20 | Part Two | NaN | 42.306714 | -71.087418 | (42.30671431, -71.08741801) | |
| 1 | I192004093 | 3006 | Medical Assistance | SICK/INJURED/MEDICAL - PERSON | E13 | 906 | NaN | 2019-01-15 21:18:00 | 2019 | 1 | Tuesday | 21 | Part Three | WALDEN ST | 42.325610 | -71.104500 | (42.32561013, -71.10449956) |
| 2 | I192004088 | 619 | Larceny | LARCENY ALL OTHERS | C11 | 402 | NaN | 2019-01-15 21:30:00 | 2019 | 1 | Tuesday | 21 | Part One | BURT ST | 42.284135 | -71.069574 | (42.28413536, -71.06957385) |
| 3 | I192004086 | 3201 | Property Lost | PROPERTY - LOST | C11 | 387 | NaN | 2019-01-15 20:48:00 | 2019 | 1 | Tuesday | 20 | Part Three | ADAMS ST | 42.272306 | -71.067214 | (42.27230624, -71.06721386) |
| 4 | I192004085 | 3108 | Fire Related Reports | FIRE REPORT - HOUSE, BUILDING, ETC. | D4 | 285 | NaN | 2019-01-15 20:29:00 | 2019 | 1 | Tuesday | 20 | Part Three | TREMONT ST | 42.336409 | -71.085650 | (42.33640891, -71.08565039) |
df['OFFENSE_CODE_GROUP'].value_counts().head(10)
Motor Vehicle Accident Response 41416 Larceny 29048 Medical Assistance 26513 Investigate Person 20717 Other 20019 Drug Violation 18358 Simple Assault 17668 Vandalism 17024 Verbal Disputes 14660 Towed 12558 Name: OFFENSE_CODE_GROUP, dtype: int64
sns.catplot(y='OFFENSE_CODE_GROUP',
kind='count',
height=8,
aspect=1.5,
order=df['OFFENSE_CODE_GROUP'].value_counts().head(10).index,
data=df);
*Please keep in mind the visualizations below will display event counts for all three years, unless otherwise specified.
sns.set(rc={'figure.figsize':(15,6)})
sns.countplot(x='OFFENSE_CODE_GROUP',data=auto, order=auto['OFFENSE_CODE_GROUP'].value_counts().index)
plt.xticks(rotation=45)
plt.ylabel('No of Incidents')
plt.xlabel("");
plt.title("Incident Categories", size=35)
plt.show()
sns.catplot(x='HOUR',
kind='count',
height=8.27,
aspect=3,
color='lightblue',
data=auto)
plt.xticks(size=20)
plt.yticks(size=20)
plt.xlabel('Hour', fontsize=30);
plt.ylabel('Count', fontsize=30);
plt.title("Incidents per Hour in the Day", size=50);
auto.groupby([auto['OCCURRED_ON_DATE'].dt.hour,'OFFENSE_CODE_GROUP',])['INCIDENT_NUMBER'].count().unstack().plot(marker='o', figsize=(15,10))
plt.ylabel('No of Incidents');
plt.xlabel('Hour of the day');
plt.legend(fontsize="x-large");
plt.xticks(np.arange(24));
plt.title("Incidents per Hour in the Day", size=40);
sns.catplot(x='DAY_OF_WEEK',
kind='count',
height=8,
aspect=3,
data=auto)
plt.xticks(size=30)
plt.yticks(size=30)
plt.xlabel('');
plt.ylabel('Count', fontsize=40);
plt.title("Incidents per Day of the Week", size=50);
months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
sns.catplot(x='MONTH',
kind='count',
height=8,
aspect=3,
color='lightblue',
data=auto)
plt.xticks(np.arange(12), months, size=30)
plt.yticks(size=30)
plt.xlabel('');
plt.ylabel('Count', fontsize=40);
plt.title("Incidents per Months in the Year", size=50);
months=['', 'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
auto.groupby('MONTH')['INCIDENT_NUMBER'].count().plot(marker='o', color='red', linewidth=2, markersize=12, markerfacecolor='lightblue', figsize=(15, 5))
plt.xticks(np.arange(0,13, 1),months)
plt.ylabel('No of Incidents');
plt.title("Incidents per Months in the Year", size=40);
months=['','Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
auto.groupby([auto['OCCURRED_ON_DATE'].dt.month,'OFFENSE_CODE_GROUP',])['INCIDENT_NUMBER'].count().unstack().plot(marker='o', figsize=(15,10));
plt.ylabel('No of Incidents');
plt.legend(fontsize="x-large");
plt.xlabel("")
plt.xticks(np.arange(0, 13, 1),months);
plt.title("Type of Incidents per Months in the Year", size=37);
The only month were the number of incidents was higher in 2018, relative to 2017, were June and November.
auto.groupby(['MONTH','YEAR'])['INCIDENT_NUMBER'].count().unstack().plot(kind='bar', figsize=(15, 6));
plt.ylabel('No of Incidents');
plt.xlabel("Month of the year");
plt.legend(loc='center left', bbox_to_anchor=(1, .8), fontsize="x-large");
plt.title("Type of Incidents per Month by Year", size=37);
sns.set(rc={'figure.figsize':(10,6)})
sns.countplot(x='OFFENSE_CODE_GROUP',data=auto[auto.SHOOTING== "Y"], order=auto['OFFENSE_CODE_GROUP'].value_counts().index)
plt.xticks(rotation=45)
plt.title('No of Incidents where there were shootings')
plt.show()
# Create basic Folium crime map
crime_map = folium.Map(location=[42.3125,-71.0875],
zoom_start = 11)
# Add data for heatmp
auto_heatmap = auto[['Lat','Long']]
auto_heatmap = auto.dropna(axis=0, subset=['Lat','Long'])
auto_heatmap = [[row['Lat'],row['Long']] for index, row in auto_heatmap.iterrows()]
HeatMap(auto_heatmap[:50000], radius=10).add_to(crime_map)
# Plot!
crime_map
# These are the last 2000 incidents. Looks like these are all over the city.
map = folium.Map(width=800,
height=500,
location=[42.33, -71.070],
zoom_start=12)
count=0
for i in range(0,len(auto)):
try:
folium.Marker([auto.iloc[i]['Lat'], auto.iloc[i]['Long']], popup=auto.iloc[i]['STREET']).add_to(map)
except:
pass
count +=1
if count > 2000:
break
map
# The furthest you stay away from these coordinates, the safer!
map = folium.Map(width=800,height=500,location=[42.33, -71.070], zoom_start=12)
folium.Marker([42.32696647, -71.06198607]).add_to(map)
folium.Marker([42.33152148, -71.07085307]).add_to(map)
folium.Marker([42.36067984, -71.05482325]).add_to(map)
folium.Marker([42.32809966, -71.06321676]).add_to(map)
folium.Marker([42.36183857, -71.05976489]).add_to(map)
map